48 research outputs found

    Towards an automatic data value analysis method for relational databases

    Get PDF
    Data is becoming one of the world’s most valuable resources and it is suggested that those who own the data will own the future. However, despite data being an important asset, data owners struggle to assess its value. Some recent pioneer works have led to an increased awareness of the necessity for measuring data value. They have also put forward some simple but engaging survey-based methods to help with the first-level data assessment in an organisation. However, these methods are manual and they depend on the costly input of domain experts. In this paper, we propose to extend the manual survey-based approaches with additional metrics and dimensions derived from the evolving literature on data value dimensions and tailored specifically for our use case study. We also developed an automatic, metric-based data value assessment approach that (i) automatically quantifies the business value of data in Relational Databases (RDB), and (ii) provides a scoring method that facilitates the ranking and extraction of the most valuable RDB tables. We evaluate our proposed approach on a real-world RDB database from a small online retailer (MyVolts) and show in our experimental study that the data value assessments made by our automated system match those expressed by the domain expert approach

    Image Data Augmentation Approaches: A Comprehensive Survey and Future directions

    Full text link
    Deep learning (DL) algorithms have shown significant performance in various computer vision tasks. However, having limited labelled data lead to a network overfitting problem, where network performance is bad on unseen data as compared to training data. Consequently, it limits performance improvement. To cope with this problem, various techniques have been proposed such as dropout, normalization and advanced data augmentation. Among these, data augmentation, which aims to enlarge the dataset size by including sample diversity, has been a hot topic in recent times. In this article, we focus on advanced data augmentation techniques. we provide a background of data augmentation, a novel and comprehensive taxonomy of reviewed data augmentation techniques, and the strengths and weaknesses (wherever possible) of each technique. We also provide comprehensive results of the data augmentation effect on three popular computer vision tasks, such as image classification, object detection and semantic segmentation. For results reproducibility, we compiled available codes of all data augmentation techniques. Finally, we discuss the challenges and difficulties, and possible future direction for the research community. We believe, this survey provides several benefits i) readers will understand the data augmentation working mechanism to fix overfitting problems ii) results will save the searching time of the researcher for comparison purposes. iii) Codes of the mentioned data augmentation techniques are available at https://github.com/kmr2017/Advanced-Data-augmentation-codes iv) Future work will spark interest in research community.Comment: We need to make a lot changes to make its quality bette

    Incorporating user preferences in multi-objective feature selection in software product lines using multi-criteria decision analysis

    Get PDF
    Software Product Lines Engineering has created various tools that assist with the standardisation in the design and implementation of clusters of equivalent software systems with an explicit representation of variability choices in the form of Feature Models, making the selection of the most ideal software product a Feature Selection problem. With the increase in the number of properties, the problem needs to be defined as a multi-objective optimisation where objectives are considered independently one from another with the goal of finding and providing decision-makers a large and diverse set of non-dominated solutions/products. Following the optimisation, decision-makers define their own (often complex) preferences on how does the ideal software product look like. Then, they select the unique solution that matches their preferences the most and discard the rest of the solutions—sometimes with the help of some Multi-Criteria Decision Analysis technique. In this work, we study the usability and the performance of incorporating preferences of decision-makers by carrying-out Multi-Criteria Decision Analysis directly within the multi-objective optimisation to increase the chances of finding more solutions that match preferences of the decision-makers the most and avoid wasting execution time searching for non-dominated solutions that are poor with respect to decision-makers’ preferences

    AudRandAug: Random Image Augmentations for Audio Classification

    Full text link
    Data augmentation has proven to be effective in training neural networks. Recently, a method called RandAug was proposed, randomly selecting data augmentation techniques from a predefined search space. RandAug has demonstrated significant performance improvements for image-related tasks while imposing minimal computational overhead. However, no prior research has explored the application of RandAug specifically for audio data augmentation, which converts audio into an image-like pattern. To address this gap, we introduce AudRandAug, an adaptation of RandAug for audio data. AudRandAug selects data augmentation policies from a dedicated audio search space. To evaluate the effectiveness of AudRandAug, we conducted experiments using various models and datasets. Our findings indicate that AudRandAug outperforms other existing data augmentation methods regarding accuracy performance.Comment: Paper has accepted at 25th Irish Machine Vision and Image Processing Conferenc

    Parallel and distributed clustering framework for big spatial data mining

    Get PDF
    Clustering techniques are very attractive for identifying and extracting patterns of interests from datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality, heterogeneity, and high complexity of some algorithms. Distributed clustering techniques constitute a very good alternative to the Big Data challenges (e.g., Volume, Variety, Veracity, and Velocity). In this paper, we developed and implemented a Dynamic Parallel and Distributed clustering (DPDC) approach that can analyse Big Data within a reasonable response time and produce accurate results, by using existing and current computing and storage infrastructure, such as cloud computing. The DPDC approach consists of two phases. The first phase is fully parallel and it generates local clusters and the second phase aggregates the local results to obtain global clusters. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. DPDC was thoroughly tested and compared to well-known clustering algorithms BIRCH and CURE. The results show that the approach not only produces high-quality results but also scales up very well by taking advantage of the Hadoop MapReduce paradigm or any distributed system

    Orchestration from the cloud to the edge

    Get PDF
    The effective management of complex and heterogeneous computing environments is one of the biggest challenges that service and infrastructure providers are facing in the Cloud-to-Thing continuum era. Advanced orchestration systems are required to support the resource management of large-scale cloud data centres integrated into big data generation of IoT devices. The orchestration system should be aware of all available resources and their current status in order to perform dynamic allocations and enable short time deployment of applications. This chapter will review the state of the art with regards to orchestration along the Cloud-to-Thing continuum with a specific emphasis on container-based orchestration (e.g. Docker Swarm and Kubernetes) and fog-specific orchestration architectures (e.g. SORTS, SOAFI, ETSI IGS MEC, and CONCERT)

    Efficient Large Scale Clustering based on Data Partitioning

    Get PDF
    3rd IEEE International Conference on Data Science and Advanced Analytics (DSAA 2016), Montreal, Canada, 17-19 October, 2016Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high complexity of some algorithms. For instance, some algorithms may have linear complexity but they require the domain knowledge in order to determine their input parameters. Distributed clustering techniques constitute a very good alternative to the big data challenges (e.g.,Volume, Variety, Veracity, and Velocity). Usually these techniques consist of two phases. The first phase generates local models or patterns and the second one tends to aggregate the local results to obtain global models. While the first phase can be executed in parallel on each site and, therefore, efficient, the aggregation phase is complex, time consuming and may produce incorrect and ambiguous global clusters and therefore incorrect models. In this paper we propose a new distributed clustering approach to deal efficiently with both phases; generation of local results and generation of global models by aggregation. For the first phase, our approach is capable of analysing the datasets located in each site using different clustering techniques. The aggregation phase is designed in such a way that the final clusters are compact and accurate while the overall process is efficient in time and memory allocation. For the evaluation, we use two well-known clustering algorithms; K-Means and DBSCAN. One of the key outputs of this distributed clustering technique is that the number of global clusters is dynamic; no need to be fixed in advance. Experimental results show that the approach is scalable and produces high quality results.Science Foundation Irelan

    Application of blockchain technology to 5G-enabled vehicular networks: survey and future directions

    Get PDF
    Blockchain is disrupting several sectors as it continues to grow mainstream. The attraction for Blockchain is increasing from various application domains looking to take advantage of its immutability, security, cost-saving, transparency and fast processing properties. Blockchain has empowered several sectors to upgrade their existing systems or operate an entire system architecture shift. For instance, Blockchain has enabled IoT systems to improve their quality of services while simultaneously ensuring their security requirements. Particularly, several works are applying Blockchain to manage trust in 5G-enabled autonomous vehicular systems to ensure secure vehicle authentication and handover, guarantee message integrity and provide an irrefutable vehicle reputation record. Vehicular network systems require proper data storage management, highly secure transactions, and non-interference networks. The immutability, tamper-proof, and security by design of Blockchain make it a suitable candidate technology for 5G vehicular network systems. We present in this paper a methodical literature analysis of the application of Blockchain to 5G vehicular networks, architecture, and technical aspects. We also highlight and discuss some issues and challenges facing the application of Blockchain technology to 5G vehicular network
    corecore